Reading device with hierarchal navigation

ABSTRACT

In some embodiments, disclosed is reading device that comprises a camera, at least one processor, and a user interface. The camera scans at least a portion of a document having text to generate a raster file. The processor processes the raster file to identify text blocks. The user interface allows a user to hierarchically navigate the text blocks when they are read to the user.

BACKGROUND

The embodiments of the invention relate to a reading machine forimpaired persons such as individuals who are visually impaired or havedyslexia.

People with disabilities, such as impaired vision or dyslexia, may havedifficulty reading printed material. Automatic systems are needed torender documents as audio recordings.

It is known to provide a mobile print digitizer for the visuallyimpaired One known device captures printed documents and reads them tothe user. A camera or scanner captures an image of a printed page, andthen runs optical character recognition (OCR) on the image. The outputis fed to a speech synthesizer such as a text-to-speech system (TTS).Unfortunately, existing systems can be inefficient in their ability toallow a user to efficiently navigate a document as the reader reads itto the user. Accordingly, new approaches are desired.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention are illustrated by way of example, and notby way of limitation, in the figures of the accompanying drawings inwhich like reference numerals refer to similar elements.

FIG. 1 is a diagram of a reading apparatus in accordance with someembodiments.

FIG. 2 is a flow diagram of a routine for implementing a readingapparatus in accordance with some embodiments.

FIG. 3A is a flow diagram of a routine for characterizing OCR'd blocksin accordance with some embodiments.

FIG. 3B is a flow diagram of a routine for characterizing OCR'd blocksin accordance with other embodiments.

FIG. 4 is an exemplary portion of a scanned document showing blockboundaries after the document has been OCR'd in accordance with someembodiments.

FIG. 5 shows the block boundaries from FIG. 4 with assigned main andancillary block designations including hierarchal level in accordancewith some embodiments.

FIG. 6 is an exemplary portion of a scanned document, with multiplearticles, showing block boundaries after the document has been OCR'd inaccordance with some embodiments.

FIG. 7 shows the block boundaries from FIG. 6 with assigned main andancillary block designations including hierarchal level in accordancewith some embodiments.

DETAILED DESCRIPTION

One of the challenges for users of reading devices is that with audioplayback, textual hierarchy (e.g., outline organization) may be lostthat would otherwise be conveyed, e.g., via size and formatting seen byvisual users viewing the text. Accordingly, in some embodiments, areading apparatus with an auditory presentation that preserves at leastsome of the intended hierarchy of information is provided. For example,in some embodiments, the user can navigate a given text by skippingthrough chunks of text delineated by differences in font size or weight,or commonly recognized words or symbols that indicate natural breakpoints.

FIG. 1 shows a block diagram of a reading apparatus 102 to read to auser a document 101 to be scanned in accordance with some embodiments.Reader 102 generally comprises a processor 104, user interface 106,camera 108, memory 110, and auditory output device 112, coupled togetheras shown.

The processor and memory may comprise any suitable combination of memoryand processing circuits, components, or combinations of the same toexecute routines to control the reader 102. The memory 110 comprisesdevice control (DC) software code 111 to control the reader 102 andexecute its various functions. In the depicted embodiment, the devicecontrol code has several modules including optical character recognition(OCR) and text-to-speech (TTS) modules. The OCR module further includesa characterization (CZN) module. There may be more modules and in someembodiments, the modules may not necessarily be related to each other asshown.

The device control program controls scanning (digitized documentacquisition), reading navigation, and general system functionality. TheOCR module converts the pre-text (e.g., rasterized scanned image)document into text data, characterizes it, and reads it to a userthrough a convenient navigation interface. (As used herein, “reading”means to convey or provide text in an audio form to a user.)

The camera may comprise any suitable device such as a charge coupleddevice (CCD) camera to acquire a raster image of the text document 101,as is known in the art. It may scan a document line by line, section bysection or it may image an entire page or sheet at once. Likewise, theauditory device 112 could comprise any suitable device to auditorilyconvey the read text to the user. For example, it could comprise one ormore speakers and/or audio interface ports for connection to headphonesor the like.

The user interface 106 may constitute any suitable components, known ornot yet developed, to allow a user to conveniently control the reader.For example, the user interface could comprise one or more buttons,wheels, joysticks or other input control components that allow a user tomanually control the reader without necessarily being able to see theuser interface (i.e., control the components by feeling them). In someembodiments, the user interface includes a five button interface, suchas that shown in FIG. 1, with up (“U”), down (“D”), left (“L”), right(“R”), and select (“S”) buttons, to allow a user to convenientlynavigate hierarchically through a document, as addressed more below.

The user interface could also include input ports (which may alsofunction as output ports) such as universal serial bus (USB), so-called“Firewire”, and/or wireless ports, e.g., to allow a user to import anelectronic document that may or may not be in a text format. Forexample, portable document format (PDF) documents (or the like) could beimported for auditory reading to a user. In addition, the user interfacecould include speech-to-text capability, e.g., a microphone withsuitable speech-to-text engine. Of course, as with any feature,especially those requiring substantial hardware and/or processing,trade-offs must be made between cost, power consumption, operatingefficiency, performance accuracy, and feature capability.

FIG. 2 shows a routine for implementing a reader such as the reader 102of FIG. 1. For example, it could be implemented with the softwaremodules shown in FIG. 1. In some embodiments, conventional OCR and TTSmodules may be used in cooperation with a CZN module designed to performmethods and concepts discussed herein. (This comprises developing an OCRwith a suitable characterization capability, appropriately modifyingand/or configuring an “off-the-shelf” OCR program to have suitablecharacterization, or another feasible approach.)

At 201, a digitized document with text to be read is acquired. This maybe done through the importation of an electronic document or by scanninga document such as text-containing document 101 using the camera 108.The digital document file should be in a suitable form for the utilizedOCR module. For example, many OCR packages typically accept raster imageformats commonly found throughout the document management industry suchas TIF, BMP, PCX and DCX, to mention just a few. Depending uponparticular design considerations, a suitable OCR solution may be used toaccept input from other sources such as fax input formats, PDF (orsimilar) formats, or common scanner driver formats such as TWAIN andISIS.

At 203, optical character recognition (OCR) and characterization isperformed on the acquired document file. Any suitable OCR tool (ormodule), presently available or specifically developed, capable ofsuitably identifying text layout and format attributes may be employed.Currently available OCR tools are generally flexible enough to conformto needed performance for techniques discussed herein.

OCR modules generally perform text recognition by defining an areawithin a frame to be converted and then processing it and examining theresults. They typically define vector bounding boxes around blocks orsections of text such as individual characters, sentences, lines,paragraphs, etc. This is illustrated in FIGS. 4 and 6. The boundingboxes identify text block layout and can also be used to identify formatattributes such as font size and the like. For example, when a boundingbox indicates the dimensions of a character, its font attributes or evenstyle attributes can be determined directly using the characterdimensions, character image mapping, character-to-area ratios, or thelike. Other well-known methods can also be used for the OCR module torecognize text format attributes. For example, reference may be made toU.S. Pat. No. 6,741,745 to Dance et al., which is incorporated byreference herein.

OCR programs are usually fairly flexible in allowing designers to adjustsoftware imaging attributes. Varying image attributes may enhance themanner in which the OCR software views text. For example, lowering thesoftware, or display, resolution (e.g., not the scanned resolution) mayallow the software to “see” a clearer image of the text thus improvingthe initial chances of correct recognition. Configuration settings mayalso be varied based on given design considerations. They can affect theformat characteristics of what is being read such as text style (e.g.,plain, italic, numeric, or image), text size, and font type.

Most OCR software allows the user to set margins of acceptable errorwhen attempting to recognize a text image. Similarly, confidence levelsfor recognition depending on results of first iterations may be used andadjusted depending upon design considerations. Confidence levels aremeasures of certainty. Depending upon desired operating performance,different (e.g., two or more) different confidence levels could be used.

Error detection methodologies are also commonly employed. For example,error detection tools include spell checks and automatic correctionsbased on the software or user specific dictionaries. Various propertiessuch as uppercase words, numeric expressions, roman numerals, propernouns, math functions, abbreviation and acronyms may also be comparedagainst appropriate more-particular dictionaries. Resolution of patternsinvolving text and numerals may be varied according to ambiguitiesdominant in the patterns.

With embodiments discussed herein, characterization, possibly amongother things, involves characterizing OCR'd text blocks so that they maybe read using hierarchy to allow a user to more efficiently navigatedesired text to be read. More on this will be discussed with referenceto FIGS. 3 a, 3B, and 4-7 below.

Next, at 205, the OCR'd and characterized text is read to the user. Anysuitable text-to-speech (TTS) solution may be used. In some embodiments,the user is allowed to navigate through the text sequentially orhierarchically. For example, in a hierarchal mode, with the userinterface of FIG. 1, the right and left buttons could be used to moveahead or behind in equivalent levels (e.g., heading-to-heading in anarticle, title-to-title in a newspaper or magazine, or fromchapter-to-chapter in a book). Likewise, the up and down buttons couldbe used for progressing to higher or lower levels, respectively. Forexample, when a desired heading is found, the user could press the downbutton to start reading the main paragraph text blocks under thatheading. In the following sections, more on characterization will beaddressed.

FIG. 3A shows an exemplary approach for performing characterization 301.At 302, the routine identifies blocks as being a main text block, anancillary text block, or as some other type of block (e.g., an imageblock). Main text blocks are blocks of text that are part of asequential, e.g., intended beginning to end reading path. On the otherhand, ancillary text is considered text such as figure descriptions,byline or bio type information, article identification information suchas title, publication, page number, date, and the like. (These are notrigid definitions but rather, may vary depending on the accuracy of theOCR and text identification methods, for example, so thatcharacterization is not so sensitive as to omit relevant informationdepending on a utilized navigation approach.)

Distinguishing between main and ancillary text blocks can allow for moreefficient sequential reading capabilities. For example, a user mightselect to read sequentially only the main text in an article. In thisway, ancillary text such as figure descriptors or footnotes could bereferenced but not read so that the main text of the article could bemore efficiently ingested.

(As used herein, the term “article” refers generally to any distincttext subject. For example, an article could be an actual article in amagazine or newspaper, or it could be a menu, a whole book, aprescription tag, a bill, or a receipt.)

Main and ancillary text blocks can be identified in any suitable way andwill likely involve application of several different criterions. Forexample, continuity content analysis could be used. Alternatively or inaddition, simpler characteristics such as relative layout and fontcharacteristics, along with simple heuristics and/or assumptions couldbe applied. For example, Text blocks proximal to an image but differentfrom a majority of text on a page could be presumed to be figuredescription text. Other types of ancillary text blocks could beidentified in ways more particular to their specific attributes. Forexample, footnote references usually arc numeric and are located in asuperscript relationship at the end of a sentence or word. When afootnote is identified, the likelihood that its encompassing text blockis a main text block is increased (i.e., may assume encompassing blockis main text). In addition, the routine will know to “look” for acorresponding numeric (probably at the lower end of the page) leading ablock of text that will likely be smaller than the encompassing text. Itcould then be reasonably assumed that this block(s) is ancillary text(and associated with the footnote reference). Other combinations ofsuitable criterion and/or assumptions could he used for similar ordifferent types of ancillary (or main) text, and the invention shouldnot be so limited.

At 304, the ancillary text is associated with appropriate main textblocks and/or with appropriate image or other types of blocks. Thisinvolves linking the ancillary text with the associated block so that itcan be read if prompted or in a read mode where ancillary text is read.It also may involve linking (e.g., sequentially) the ancillary text withother related ancillary text. For example, byline type information suchas author information, accreditation information, etc., could be linkedas ancillary text to a common main title text block.

Ancillary text blocks may be associated with main text or other blocktypes in any suitable way. As with other aspects of characterization, itmay take into account (or even be driven by) the layout of the page. Forexample, the routine may follow printing rules, e.g., with magazine,journal, book specific printing rules. As already discussed, it shouldalso consider unique characteristics of already identified blocks. Forexample, image blocks likely have associated descriptive ancillaryblocks, footnote reference numbers will likely have associated footnotetext, and a text block, aligned with an identified title block maylikely be ancillary byline text.

At 306, hierarchy levels are assigned to main text blocks. Theassignments are made at least based on text format attributes. Anysuitable attributes and/or filters can be used to assign differentrelative levels. For example, in some embodiments, text that is 10%larger or greater than the majority of text in a different block couldbe tagged as a higher level. For a given article, there may be manydifferent sizes and types, so alignment and style could also be takeninto account. The content of the text itself could also be considered.

For example, certain phrases may be identified as being more likely tobe a header or title, rather than a narration such as in an articleparagraph. Actual key words could also be recognized. For example, in abook, the word “Chapter n” could be interpreted as a chapter break,especially if not surrounded by words or if larger and/or with adifferent font than words proximal to it.

Headers, for example, could be detected by the combination of differentsize, font, and lack of ending punctuation. Magazine editors arecurrently fond of splashing text in the middle of articles, functioningas a skimming aid or attention-getter, which varies in size or color.Sometimes two or three sizes may be used with one of these sentences, asa way to grab attention. Sometimes the size changes arecounter-intuitive (smaller text used to grab attention). Therefore,position of the text in the block could be a factor in the presentation,with text in a similar font, but different than the article body beingcalled out as a heading.

(The dotted arrows in the flow diagram are meant to indicate that thedepicted block processes do not necessarily occur sequentially. That is,they may be running concurrently with each other and be affected bycharacterizations, identifications, and OCR'ing that has alreadyoccurred. They also may affect how each other processes its tasks andcertainly can affect how OCR'ing and even digitized document acquisitionis occurring. For example, as already discussed, when a footnotereference is identified at block 306, this information may be madeavailable to 302 and 304, which could influence them in characterizingan appropriate block, e.g., near bottom of page with leading matchingnumeric, to be characterized as associated ancillary text. As anotherexample, in any aspect of characterization, the routine might instructthe OCR process to increase software resolution in a certain area ifcertain text is “expected” but not yet identified. Accordingly, itshould be appreciated that the routine blocks and processes discussedherein may be performed in various orders and will likely affect eachother, e.g., sequentially, iteratively, and/or otherwise.)

FIG. 3B shows another embodiment of a characterization routine, amodification of the routine of FIG. 3A, for performing blockcharacterization 205. At 322, it determines if the scanned textcorresponds to a single article (e.g., book, menu, instructions) or tomultiple articles (e.g., a newspaper page or magazine section). This canallow it to use separate (particularized) heuristics for each situationto make characterization more efficient and/or more accurate. Forexample, if the routine knows that it is characterizing blocksconstituting multiple articles, it may place more emphasis on searchingfor connecting navigational text such as continuation references and thelike. This could even be incorporated into the OCR and/or targetacquisition (e.g., scanning) processes, for example, to increasesoftware or scanning resolution in areas where such connectors may beexpected.

At 323, sequential identification is assigned to the blocks. This maycomprise ordering the blocks (indicated with the vector bounding boxes)based on their relative positions to one another. For example, as withthe examples of FIGS. 4 through 7, a left-to-right and top-to-bottomordering scheme may be used.

At 324, the blocks are identified as main, ancillary, navigational, orother. Main and ancillary text may be identified as discussed above.Navigational text is text such as page numbers, column numbers, page orarticle continuation references (e.g., “continued from page 23), and thelike. They may be used to more efficiently piece together a sequence ofmain text blocks for a given article.

At 326 and 330, ancillary block association and main block levelassignments, as discussed previously, are performed. At 328,navigational text blocks are associated with other relevant blocks, Forexample, they may be associated with main or ancillary text (e.g., pagenumbers associated with narrative text, footnotes, etc, on the page)and/or they may be linked with other navigation blocks. For example,page numbers would likely be linked with previous and/or subsequent pagenumbers.

FIG. 4 shows a portion of an article with indicated block boundaries 402to 426. This scanned document is from a single article with a title(402). byline information 404, headings (406, 409, 418, 422), footnotetext (412), article identifier text 414, and page identifier 426. (Notethat these block boundaries may be the product of additional blockdefinition refinement after initial OCR bounding boxes have beengenerated. Depending on the utilized OCR module, bounding boxes may bindcharacters and words within sentences or paragraphs. Additionalrefinement may thus he desired to put these into more logically uniformblocks for reading to a user. For example, words could be put intosentence blocks and sentence blocks into paragraph blocks, etc. Notethat the block outlines shown in FIGS. 5 and 7, for simplicity ofpresentation, do not show nested block boundaries within otherboundaries, but such nesting may be implemented, regardless of whetherseparate vector boxes are actually generated. That is, blocks mayinclude sub-blocks defined and/or identified in any suitable fashion.For example, paragraph main text blocks could include sub-blockdefinition for sentences and words so that they may be separately reador skipped. Likewise, text blocks with footnote identifiers couldinclude sub-block identification for the footnote numbers, as anexample, so that they may be linked to the actual ancillary footnotetext.)

FIG. 5 shows how the blocks from FIG. 4 may be assigned and associated,pursuant to the routines of FIGS. 3A or 3B, in accordance with someimplementations. As shown, they have been sequentially identified,independent of the type of box. (This numeric identifier is shown in theleft or upper-left portion of each box.) Each main text block isidentified with an “M”, while each ancillary block is identified with an“A”. The main text also has level designations with a sequenceidentifier for that level. For example, block 418, which is a heading,is the third level 2 block so it has an “L2:3” identifier.

Likewise, the ancillary blocks have association identifiers. Forexample, ancillary byline block 404 has an identifier of “A[1]:1”,whereby the brackets identify the block from which it is associated (inthis case, block 1), while the “1” after the “:” indicates that it isthe first ancillary block associated with block 1 (also referenced as402).

FIGS. 6 and 7 similarly show another example of a text document portionthat has been characterized. In this example, the document portion isfrom a multiple article document (i.e., a portion of a newspaper in thisexample).

In the preceding description, numerous specific details have been setforth. However, it is understood that embodiments of the invention maybe practiced without these specific details. In other instances,well-known circuits, structures and techniques may have not been shownin detail in order not to obscure an understanding of the description.With this in mind, references to “one embodiment”, “an embodiment”,“example embodiment”, “various embodiments”, etc., indicate that theembodiment(s) of the invention so described may include particularfeatures, structures, or characteristics, but not every embodimentnecessarily includes the particular features, structures, orcharacteristics. Further, some embodiments may have some, all, or noneof the features described for other embodiments.

The invention is not limited to the embodiments described, but can bepracticed with modification and alteration within the spirit and scopeof the appended claims.

It should be appreciated that example sizes/models/values/ranges mayhave been given, although the present invention is not limited to thesame. As manufacturing techniques (e.g., photolithography) mature overtime, it is expected that devices of smaller size could be manufactured.Further, arrangements may be shown in block diagram form in order toavoid obscuring the invention, and also in view of the fact thatspecifics with respect to implementation of such block diagramarrangements are highly dependent upon the platform within which thepresent invention is to be implemented, i.e., such specifics should bewell within purview of one skilled in the art. Where specific details (eg., circuits) are set forth in order to describe example embodiments ofthe invention, it should be apparent to one skilled in the art that theinvention can be practiced without, or with variation of, these specificdetails. The description is thus to be regarded as illustrative insteadof limiting.

1. An apparatus, comprising: an acquisition device configured to acquirea digitization of at least a portion of a document, the document havinga plurality of pages that each have a plurality of blocks of text; andat least one processor configured to characterize each of the pluralityof blocks of text from each page of the document as a main text block oras an ancillary text block at least by identifying, on each of theplurality of pages, one or more blocks of text that corresponds to afigure descriptor or footnote and to present to a user multiple maintext blocks sequentially without presenting any ancillary text block,the multiple main text blocks being from multiple pages of the pluralityof pages.
 2. (canceled)
 3. The apparatus of claim 1, wherein the atleast one processor discerns different hierarchal levels among theplurality of blocks of text of the plurality of pages based on the sizeof the text characters within the plurality of blocks of text.
 4. Theapparatus of claim 3, wherein the processor discerns differenthierarchal levels based on the position of the text blocks relative toone another.
 5. The apparatus of claim 4, in which the processordiscerns different hierarchal levels based on text content within thetext blocks.
 6. The apparatus of claim 1, comprising memory having anOCR module to generate the text blocks when executed by the at least oneprocessor.
 7. The apparatus of claim 6, comprising memory having acharacterization module to characterize the text blocks when executed byat least one processor.
 8. The apparatus of claim 1, comprising anauditory device to provide to the user the read text in auditory form.9. The apparatus of claim 5, in which the processor is to change ascanning resolution for an area associated with one of the text blocksbased on a type of content expected in the one of the text blocks.10-23. (canceled)